AD616408 


r 


FUNCTIONAL  EQUATIONS  IN  THE  THEORY  OF 
DYNAMIC  PROGRAMMING — XI : 

LIMIT  THEOREMS 


Richard  Bellman 
Mathematics  Division 
The  RAND  Corporation 


P-1845 


50  November  1959 


■Approved  tot  OTS  relojw 


COPY 


± 


HARD  COPY* 
MICROFICHE 


Reproduced  by 

The  RAND  Corporation  •  Santo  Monica  •  California 
The  views  expressed  in  this  paper  are  not  necessarily  those  of  the  Corporation 


SUMMARY 


P-1843 

11-30-59 

il 


In  this  paper  we  wish  to  present  a  limit  theorem  valid 
for  a  general  class  of  Markovian  decision  processes.  The 
result  Is  of  Interest  because  of  the  simple  conditions  which 
are  Imposed  and  the  rather  simple  argument  which  is  used. 


P-1843 

11-30-59 

1 


FUNCTIONAL  EQUATIONS  IN  THE  THEORY  OF 
DYNAMIC  PROGRAMMING- -XI :  LIMIT  THEOREMS 

Riehard  Bellman 


1 .  Introduction 

In  this  paper  we  wish  to  present  a  limit  theorem  valid 
for  a  general  class  of  Markovian  decision  processes,  1  .  The 
result  is  of  Interest  because  of  the  simple  conditions  which 
are  imposed  and  the  rather  simple  argument  which  is  used. 

Let  p  be  an  element  of  a  finite  set  P,  and  q  be  an 
element  of  another  finite  set  Q.  We  think  of  p  as  the 
state  vector  of  a  discrete  dynamic  programming  process,  and  q 
as  the  decision  variable  at  each  stage.  A  choice  of  q  re¬ 
sults  In  a  transformation  from  p  to  T(p,q),  taken  to  be  an 
element  of  P,  and  In  a  return  of  b(p,q),  a  scalar  function 
defined  for  all  p  P  and  q  Q. 

Denoting  by  p^Pg# . . .  ,PN  ..the  succession  of  states,  and 
by  q1#q2, • • • ,qN  the  sequence  of  decisions,  we  have  as  the 
overall  return  of  an  N-stage  process  the  function 

(1)  Rjj  -  b(p1,q1)  +  b(p2,q2)  +  ...  +  b(pN,qN). 

We  wish  to  choose  the  so  as  to  maximize  R^. 

Introducing  the  function  f^p^  defined  by  the  relation 

(2)  fN(px)  °  ^ 

for  all  p1  P  and  N  =  1,2,...,  we  have  the  recurrence 


relation 


(3) 
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fN(px)  -  Max  b(p1,q1)  +  fIf_1(T(p1,q1))  , 
ql 

for  N  £  2,  with 
(4)  fl(Pl)  c 

ql 

It  Is  reasonable  to  expect  a  "steady-state"  policy  which 
is  approached  asymptotically  as  N  — -  oo  ;  cf .  2,3,4  ,  for 

results  of  this  nature.  The  study  of  the  asymptotic  behavior 
of  the  sequence  f^Cp^)  determined  by  (3)  is  a  problem  of 
some  difficulty,  and  usually  requires  some  detailed  knowledge 
of  the  transformation  T(p,q)  and  the  function  b(p,q).  We 
shall  show  in  what  follows  that  a  fairly  general  result  can  be 
easily  obtained  under  mild  assumptions.  Unfortunately,  al¬ 
though  we  can  derive  the  asymptotic  form  of  fN(p),  we  cannot 
assert  the  existence  of  an  asymptotic  policy.  Further 
assumptions  appear  to  be  required  for  this. 

2.  Statement  of  Result 

Let  us  make  the  following  two  assumptions: 

(1)  (a)  b(p,q)  ±  0,  p  *  Q# 

(b)  T(p,q)  Is  such  that  by  means  of  a  suitable  choice 
of  q's,  q1#qo# . . . ,qK#  It  is  possible  to  go  from 
any  element  p^  P  to  any  other  element  pg  P. 

We  wish  to  establish 

F, 


Theorem.  Under  the  foregoing  assumptions,  for  all  p^ 


(2) 
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aa  K  — ♦  od  ,  where  a  Is  Independent  of  p^ , 


3»  Proof  of  Theorem 

Referring  to  (1.1),  we  may  write 

+  fn<Tm>] ' 

where  T^  Is  the  state  attained  after  the  choice  of 

Introduce  the  new  sequence  |unJ  by  means  of  the  relation 


(2) 


u  -  Max  f  (p). 

p 


Then,  it  la  clear  from  (l)  that 
(5)  “n+n  1  %  +  un 

for  m,n  £  1.  It  is  well  known  that  this  Inequality  Implies 
that  there  exists  a  constant  a  such  that 


(4)  un  ~  na 

aa  r.  — *  oo  ,  [5]  . # 

Let  us  now  show  that  ^n(Pi)  ~  as  n  — ♦  co  .  Let  for 

each  n,  pn  be  a  value  of  p  for  which  fn(p)  assumes  the 

value  Max  r  (p) .  Choose  a  sequence  of  q’s,  q1,q2» •  •  • 

P 

This  result  is  used  in  the  foregoing  fashion  by 
Purstenburg  and  Kesten  in  a  forthcoming  paper. 
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which  transforms  into  the  value  We  know,  by 

assumptions,  that  the  number  of  transformations  required  to  go 
from  any  point  p^  to  any  other  point  is  uniformly  bounded. 
Take  M  to  be  this  bound. 

By  virtue  of  the  nonnegativity  of  b(p,q),  we  have 

(5)  fn(pl)  ^  fn-K^pn-M^  ^  fn-M^pr»-M^ 

Since  f  (p^)  fn^pn^#  by  definition  of  the  element  pn, 
we  have  for  large  n  £  n(  ), 

(6)  n(a  +  )  £  fn(Pn)  1  fn(Px  )  *  WVm>  i  <n  -  - 

Hence 

(7)  ^(Pi)  na 

as  n  — •  oo  ,  the  dealred  result. 
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